<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <title>Store SpamAssassin bayes in SQL</title>
        <link rel="stylesheet" type="text/css" href="./css/markdown.css" />
    </head>
    <body>

    <div id="navigation">
    <a href="https://www.iredmail.org" target="_blank">
        <img alt="iRedMail web site"
             src="./images/logo-iredmail.png"
             style="vertical-align: middle; height: 30px;"
             />&nbsp;
        <span>iRedMail</span>
    </a>
    &nbsp;&nbsp;//&nbsp;&nbsp;<a href="./index.html">Document Index</a></div><h1 id="store-spamassassin-bayes-in-sql">Store SpamAssassin bayes in SQL</h1>
<div class="admonition attention">
<p class="admonition-title">Attention</p>
<p>Check out the lightweight on-premises email archiving software developed by iRedMail team: <a href="https://spiderd.io/">Spider Email Archiver</a>.</p>
</div>
<div class="toc">
<ul>
<li><a href="#store-spamassassin-bayes-in-sql">Store SpamAssassin bayes in SQL</a><ul>
<li><a href="#summary">Summary</a></li>
<li><a href="#create-required-sql-database-used-to-store-bayes-data">Create required SQL database used to store bayes data</a></li>
<li><a href="#enable-bayes-modules-in-spamassassin">Enable Bayes modules in SpamAssassin</a></li>
<li><a href="#check-number-of-learned-spamshams">Check number of learned spams/hams</a></li>
<li><a href="#auto-learn-spamham-with-dovecot-imap_sieve-plugin">Auto learn spam/ham with Dovecot imap_sieve plugin</a></li>
<li><a href="#backup-your-sa_bayes-database">Backup your sa_bayes database</a></li>
<li><a href="#references">References</a></li>
</ul>
</li>
</ul>
</div>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>The bayesian classifier can only score new messages after it already learn
200 known spams and 200 known hams.</p>
</div>
<h2 id="summary">Summary</h2>
<p>This article will guide you to configure related components to store
SpamAssassin Bayes data in SQL server, and allow webmail users to report spam
with one click.</p>
<p>Tested with:</p>
<ul>
<li>iRedMail-0.8.0 and later releases</li>
<li>CentOS 6.2 (x86_64)</li>
<li>SpamAssassin-3.3.1</li>
<li>Amavisd-new-2.6.6</li>
<li>MySQL-5.1.61</li>
<li>Roundcubemail-0.7.2</li>
</ul>
<p>Notes:</p>
<ul>
<li>This article should work with all iRedMail releases. We take iRedMail-0.8.0 for example.</li>
<li>This article should work with all backends: OpenLDAP, MySQL, MariaDB, PostgreSQL. We take MySQL backend for example.</li>
<li>This article should work with Amavisd-new-2.6.0 and later versions.</li>
</ul>
<p><strong>IMPORTANT NOTES</strong>:</p>
<ul>
<li>The bayesian classifier can only score new messages after it already learn
  known spams and 200 known hams.</li>
<li>If Spamassassin fails to identify a spam, teach it so it can do better next
  time. e.g. Mark it as spam in roundcube webmail.</li>
<li>Read <code>References</code> section at the end of this article before asking questions.</li>
</ul>
<h2 id="create-required-sql-database-used-to-store-bayes-data">Create required SQL database used to store bayes data</h2>
<p>We need to create a SQL database and necessary tables to store SpamAssassin
bayes data. The RPM package installed on CentOS 6 doesn't ship SQL template
for bayes database, so we have to download it from Apache web site. We're
running SpamAssassin-3.3.1, so what we need is this SQL template file:
<a href="http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_release_3_3_1/sql/bayes_mysql.sql">http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_release_3_3_1/sql/bayes_mysql.sql</a>
If you're running different version, please find the proper SQL file here:
<a href="http://svn.apache.org/repos/asf/spamassassin/tags/">http://svn.apache.org/repos/asf/spamassassin/tags/</a>.</p>
<pre><code># cd /root/
# wget http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_release_3_3_1/sql/bayes_mysql.sql
</code></pre>
<p>Create MySQL database and import SQL template file:</p>
<pre><code># mysql -uroot -p
mysql&gt; CREATE DATABASE sa_bayes;
mysql&gt; USE sa_bayes;
mysql&gt; SOURCE /root/bayes_mysql.sql;
</code></pre>
<p>Create a new MySQL user (with password <code>sa_user_password</code>) and grant
permissions. <strong>IMPORTANT NOTE</strong>: Please replace password <code>sa_user_password</code>
by your own password.</p>
<pre><code>mysql&gt; GRANT SELECT, INSERT, UPDATE, DELETE ON sa_bayes.* TO sa_user@localhost IDENTIFIED BY 'sa_user_password';
mysql&gt; FLUSH PRIVILEGES;
</code></pre>
<h2 id="enable-bayes-modules-in-spamassassin">Enable Bayes modules in SpamAssassin</h2>
<p>Edit <code>/etc/mail/spamassassin/local.cf</code>, add (or modify below settings):</p>
<pre><code>use_bayes          1
bayes_auto_learn   1
bayes_auto_expire  1

# Store bayesian data in MySQL
bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn      DBI:mysql:sa_bayes:127.0.0.1:3306

# Store bayesian data in PostgreSQL
#bayes_store_module Mail::SpamAssassin::BayesStore::PgSQL
#bayes_sql_dsn      DBI:Pg:sa_bayes:127.0.0.1:5432

bayes_sql_username sa_user
bayes_sql_password sa_user_password

# Override the username used for storing data in the database.
# This could be used to group users together to share bayesian filter data.
# You can also use this config option to trick sa-learn to learn data as a
# specific user.
#
# In iRedMail, SpamAssassin is called by Amavisd, so we must set it to be
# same as Amavisd daemon user:
#   - on Linux, it's user `amavis`.
#   - on FreeBSD, it's user `vscan`.
#   - on OpenBSD, it's user `_vscan`.
bayes_sql_override_username amavis
</code></pre>
<p>Make sure SpamAssassin will load bayes modules:</p>
<pre><code># /etc/init.d/amavisd stop
# amavisd -c /etc/amavisd/amavisd.conf debug 2&gt;&amp;1 | grep -i 'bayes'
May 16 09:59:33 ... SpamAssassin loaded plugins: ..., Bayes, ...
May 16 10:27:38 ... extra modules loaded after daemonizing/chrooting:
    Mail/SpamAssassin/BayesStore/MySQL.pm, Mail/SpamAssassin/BayesStore/SQL.pm, ...
</code></pre>
<p>Looks fine, now press <code>Ctrl-C</code> to terminate above command, and start Amavisd
service again normally:</p>
<pre><code># /etc/init.d/amavisd restart
</code></pre>
<p>It is required to initialize the database by learning a message. We use the
sample spam email shipped in the RPM package provided by CentOS 6:</p>
<pre><code># rpm -ql spamassassin | grep 'sample-spam'
/usr/share/doc/spamassassin-3.3.1/sample-spam.txt

# sa-learn --spam --username=amavis /usr/share/doc/spamassassin-3.3.1/sample-spam.txt
Learned tokens from 1 message(s) (1 message(s) examined)
</code></pre>
<h2 id="check-number-of-learned-spamshams">Check number of learned spams/hams</h2>
<p>Run SQL commands below to check how many spams/hams have been learnt (note:
the numbers will be different on your server):</p>
<pre><code>mysql&gt; USE sa_bayes;
mysql&gt; SELECT username,spam_count,ham_count FROM bayes_vars;
+----------+------------+-----------+
| username | spam_count | ham_count |
+----------+------------+-----------+
| amavis   |          3 |        38 |
+----------+------------+-----------+
</code></pre>
<h2 id="auto-learn-spamham-with-dovecot-imap_sieve-plugin">Auto learn spam/ham with Dovecot imap_sieve plugin</h2>
<ul>
<li><a href="./dovecot.imapsieve.html">Auto learn spam/ham with Dovecot imap_sieve plugin</a></li>
</ul>
<h2 id="backup-your-sa_bayes-database">Backup your <code>sa_bayes</code> database</h2>
<p>Don't forget to update <code>/var/vmail/backup/backup_mysql.sh</code> (or
<code>backup_pgsql.sh</code> if you're running PostgreSQL backend) to backup this newly
created <code>sa_bayes</code> database. For example:</p>
<pre><code># Part of file `/var/vmail/backup/backup_mysql.sh`
DATABASES='... sa_bayes'
</code></pre>
<h2 id="references">References</h2>
<ul>
<li><a href="http://wiki.apache.org/spamassassin/BayesInSpamAssassin">Bayes Introduction</a>. Please do read section <code>Things to remember</code>.</li>
<li><a href="http://wiki.apache.org/spamassassin/BayesFaq">SpamAssassin Bayes Frequently Asked Questions</a></li>
</ul><div class="footer">
    <p style="text-align: center; color: grey;">All documents are available in <a href="https://github.com/iredmail/docs/">GitHub repository</a>, and published under <a href="http://creativecommons.org/licenses/by-nd/3.0/us/" target="_blank">Creative Commons</a> license. You can <a href="https://github.com/iredmail/docs/archive/master.zip">download the latest version</a> for offline reading. If you found something wrong, please do <a href="https://www.iredmail.org/contact.html">contact us</a> to fix it.</p>
</div></body></html>